pacman::p_load(sf, spdep, tmap, tidyverse, knitr, mapview, leaflet, leafpop)Take Home Ex 1
Overview of the Ex
Objectives
Exploratory Spatial Data Analysis (ESDA) hold tremendous potential to address complex problems facing society. In this study, you are tasked to apply appropriate Local Indicators of Spatial Association (GLISA) and Emerging Hot Spot Analysis (EHSA) to undercover the spatial and spatio-temporal mobility patterns of public bus passengers in Singapore.
Task
The specific tasks of this take-home exercise are as follows:
Geovisualisation and Analysis
With reference to the time intervals provided in the table below, compute the passenger trips generated by origin at the hexagon level,
Peak hour period Bus tap on time Weekday morning peak 6am to 9am Weekday afternoon peak 5pm to 8pm Weekend/holiday morning peak 11am to 2pm Weekend/holiday evening peak 4pm to 7pm Display the geographical distribution of the passenger trips by using appropriate geovisualisation methods,
Describe the spatial patterns revealed by the geovisualisation (not more than 200 words per visual).
Local Indicators of Spatial Association (LISA) Analysis
Compute LISA of the passengers trips generate by origin at hexagon level.
Display the LISA maps of the passengers trips generate by origin at hexagon level. The maps should only display the significant (i.e. p-value < 0.05)
With reference to the analysis results, draw statistical conclusions (not more than 200 words per visual).
Emerging Hot Spot Analysis(EHSA)
With reference to the passenger trips by origin at the hexagon level for the four time intervals given above:
Perform Mann-Kendall Test by using the spatio-temporal local Gi* values,
Prepared EHSA maps of the Gi* values of the passenger trips by origin at the hexagon level. The maps should only display the significant (i.e. p-value < 0.05).
With reference to the EHSA maps and data visualisation prepared, describe the spatial patterns reveled. (not more than 250 words per cluster).
Data
Apstial data
For the purpose of this take-home exercise, Passenger Volume by Origin Destination Bus Stops downloaded from LTA DataMall will be used.
Geospatial data
Two geospatial data will be used in this study, they are:
Bus Stop Location from LTA DataMall. It provides information about all the bus stops currently being serviced by buses, including the bus stop code (identifier) and location coordinates.
hexagon, a hexagon layer of 250m (this distance is the perpendicular distance between the centre of the hexagon and its edges.) should be used to replace the relative coarse and irregular Master Plan 2019 Planning Sub-zone GIS data set of URA. click here
Loading the Packages
Import the Bus Stop Location
busstops = st_read(dsn = "data/geospatial",
layer = "BusStop") %>%
st_transform (crs = 3414)Reading layer `BusStop' from data source
`C:\zjjgithubb\ISSS624\TakehomeEx\Ex01\data\geospatial' using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
Creating Hexagonal Tessellation (Honeycomb)
The tessellation is created using the st_make_grid function.
test_pointsis the input spatial data. The function will define the bounding box (processing extent in ArcGIS sense) of the tessellation.c(150, 150)is the cell size, in the unit the crs of the spatial data is using. The default option splits the bounding box into 100 grids, 10 rows x 10 columns layout.Square argument indicates whether you are a square grid (TRUE) or hexagon grid (FALSE)
Caveat: The cell size is in the same unit to the projected coordinate system (pcs) of the spatial data (in this case, meters). If the values are too small, R will produce a disastrous amount of grids that could make your laptop crash.
So for us to create a hexagon with 250m (where the distance is taken from the centre of the hexagon to the edge) - we should be using c(500,500). Also verified by measuring the hexagons on the map.
area_honeycomb_grid = st_make_grid(busstops, c(500, 500), what = "polygons", square = FALSE)
# To sf and add grid ID
honeycomb_grid_sf = st_sf(area_honeycomb_grid) %>%
# add grid ID
mutate(grid_id = 1:length(lengths(area_honeycomb_grid)))Number of Bus Stops per Hexagon
I decided to look at the number of bus stops within each hexagon as well.
This is to determine if the assumption that if a hexagon has more bus stops, there should be more trips originating from there regardless of the time period.
So we use the following chunk code to count the number of bus stops within each hexagon, which is labelled by its grid_id.
# count number of points in each grid
# https://gis.stackexchange.com/questions/323698/counting-points-in-polygons-with-sf-package-of-r
honeycomb_grid_sf$n_busstops = lengths(st_intersects(honeycomb_grid_sf, busstops))
# remove grid without value of 0 (i.e. no points in side that grid)
honeycomb_count = filter(honeycomb_grid_sf, n_busstops > 0)Next, we will plot the map view.
Show the Code
tmap_mode("view")
map_honeycomb = tm_shape(honeycomb_count) +
tm_fill(
col = "n_busstops",
palette = "Blues",
style = "cont",
title = "Number of Bustops",
id = "grid_id",
showNA = FALSE,
alpha = 0.6,
popup.vars = c(
"No of Bus Stops: " = "n_busstops"
),
popup.format = list(
n_busstops = list(format = "f", digits = 0)
)
) +
tm_borders(col = "grey40", lwd = 0.7)
map_honeycombAnalysis on the Locations of the Bus Stops in Singapore
Based on the map that I have plotted above, I found out the following:
Grid 4820 has one of the highest number of bus stops, with 11. Located along Pasir Ris Drive 1. Interestingly, it is not near any bus interchange or MRT stations.
Grid 4635, located at the vicinity of Pasir Ris MRT station, has 10 bus stops. This is quite common across the regions in Singapore, such as Punggol MRT station, Sengkang MRT station, Hougang MRT station as well.
Interestingly, the interchange is considered as 2 bus stop. For example, Pasir Ris Interchange is counted as 2 Bus Stop, with bus stop number 77008 and 77009.
Another observation is that the some of those grids with higher number of bus stops are located near the entrance/exit of major expressways.
Based on these observations, I would think that the bus transport model is somewhat like a hub and spoke model. I will further analyse this assumption in the later parts.
Import Passenger Data
I will import the Passengers, Origin and Destination data by using the readr package and read_csv.
For the purpose of this exercise - we will use the data from Oct 2023.
odbus <- read_csv("data/aspatial/origin_destination_bus_202310.csv")
head(odbus, n=5)# A tibble: 5 x 7
YEAR_MONTH DAY_TYPE TIME_PER_HOUR PT_TYPE ORIGIN_PT_CODE DESTINATION_PT_CODE
<chr> <chr> <dbl> <chr> <chr> <chr>
1 2023-10 WEEKENDS/~ 16 BUS 04168 10051
2 2023-10 WEEKDAY 16 BUS 04168 10051
3 2023-10 WEEKENDS/~ 14 BUS 80119 90079
4 2023-10 WEEKDAY 14 BUS 80119 90079
5 2023-10 WEEKDAY 17 BUS 44069 17229
# i 1 more variable: TOTAL_TRIPS <dbl>
I need to convert the Origin_PT_Code data into factor to allow R to manipulate the data more easily.
odbus$ORIGIN_PT_CODE <- as.factor(odbus$ORIGIN_PT_CODE)
odbus$DESTINATION_PT_CODE <- as.factor(odbus$DESTINATION_PT_CODE)Check the Data for missing values
I will use the following code to check for missing values:
# Check for missing values
missing_values <- odbus %>%
summarise_all(~ sum(is.na(.)))
# Display columns with missing values
print("Columns with missing values:")[1] "Columns with missing values:"
print(missing_values)# A tibble: 1 x 7
YEAR_MONTH DAY_TYPE TIME_PER_HOUR PT_TYPE ORIGIN_PT_CODE DESTINATION_PT_CODE
<int> <int> <int> <int> <int> <int>
1 0 0 0 0 0 0
# i 1 more variable: TOTAL_TRIPS <int>
# Check for duplicate rows
duplicate_rows <- odbus %>%
filter(duplicated(.))
# Display duplicate rows
print("Duplicate rows:")[1] "Duplicate rows:"
print(duplicate_rows)# A tibble: 0 x 7
# i 7 variables: YEAR_MONTH <chr>, DAY_TYPE <chr>, TIME_PER_HOUR <dbl>,
# PT_TYPE <chr>, ORIGIN_PT_CODE <fct>, DESTINATION_PT_CODE <fct>,
# TOTAL_TRIPS <dbl>
Filter the data for the specific timeslots
Next, to answer the objectives, I need to filter the data according to the four different time slots.
origtrip_wd_am <- odbus %>%
filter(DAY_TYPE == "WEEKDAY") %>%
filter(TIME_PER_HOUR >= 6 &
TIME_PER_HOUR <= 9) %>%
group_by(ORIGIN_PT_CODE) %>%
summarise (Wd_AM = sum(TOTAL_TRIPS))write_rds(origtrip_wd_am, "data/rds/origtrip_wd_am.rds") origtrip_wd_pm <- odbus %>%
filter(DAY_TYPE == "WEEKDAY") %>%
filter(TIME_PER_HOUR >= 17 &
TIME_PER_HOUR <= 20) %>%
group_by(ORIGIN_PT_CODE) %>%
summarise (Wd_PM = sum(TOTAL_TRIPS))write_rds(origtrip_wd_pm, "data/rds/origtrip_wd_pm.rds") origtrip_wend_am <- odbus %>%
filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
filter(TIME_PER_HOUR >= 11 &
TIME_PER_HOUR <= 14) %>%
group_by(ORIGIN_PT_CODE) %>%
summarise (We_AM = sum(TOTAL_TRIPS))write_rds(origtrip_wend_am, "data/rds/origtrip_wend_am.rds") origtrip_wend_pm <- odbus %>%
filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
filter(TIME_PER_HOUR >= 16 &
TIME_PER_HOUR <= 19) %>%
group_by(ORIGIN_PT_CODE) %>%
summarise (We_PM = sum(TOTAL_TRIPS))write_rds(origtrip_wend_pm, "data/rds/origtrip_wend_pm.rds") Visualising the Peak Hours on Map
To visualise the data on a map, I will join the data with the geospatial model.
summary_trips <- busstops %>%
left_join(origtrip_wd_am, by = c("BUS_STOP_N" = "ORIGIN_PT_CODE")) %>%
left_join(origtrip_wd_pm, by = c("BUS_STOP_N" = "ORIGIN_PT_CODE")) %>%
left_join(origtrip_wend_am, by = c("BUS_STOP_N" = "ORIGIN_PT_CODE")) %>%
left_join(origtrip_wend_pm, by = c("BUS_STOP_N" = "ORIGIN_PT_CODE"))Check for any duplicate data:
duplicate <- summary_trips %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()Next, I would combine the data with the hexagon grid that was created. At the same time, I would remove any grids with no values, i.e. no bus stops at that location or no trips at that time period.
trips_with_grid <- st_join(summary_trips, honeycomb_grid_sf)
# Sum the number of trips in each grid cell
summary_trips_with_grid <- trips_with_grid %>%
group_by(grid_id) %>%
summarise(sum_wd_am = sum(Wd_AM, na.rm = TRUE),
sum_wd_pm = sum(Wd_PM, na.rm = TRUE),
sum_we_am = sum(We_AM, na.rm = TRUE),
sum_we_pm = sum(We_PM, na.rm = TRUE))
summary_trips_with_hexagon <- st_join (honeycomb_grid_sf, summary_trips_with_grid)
# Remove grid cells without any trips
summary_trips_with_hexagon_count = summary_trips_with_hexagon %>%
filter(!is.na(sum_wd_am) & sum_wd_am > 0 |
!is.na(sum_wd_pm) & sum_wd_pm > 0 |
!is.na(sum_we_am) & sum_we_am > 0 |
!is.na(sum_we_pm) & sum_we_pm > 0)Once the data is ready, I can plot the map. For this instance, I plot it with a comparison of all the four time period. This can show the difference in number of trips per grid_id more clearly.
#| code-fold: true
#| code-summary: "Show the code"
map_honeycomb = tm_shape(summary_trips_with_hexagon_count) +
tm_fill(
col = c("sum_wd_am", "sum_wd_pm", "sum_we_am", "sum_we_pm"),
palette = "Blues",
style = "cont",
title = "Passenger Trips by Origin",
id = "grid_id",
showNA = FALSE,
alpha = 0.6,
popup.vars = c(
"Weekday Morning:" = "sum_wd_am", "Weekday Afternoon:" = "sum_wd_pm", "Weekend Morning:" = "sum_we_am", "Weekend Evenings:" = "sum_we_pm"
),
popup.format = list(
sum_wd_am = list(format = "f", digits = 0),
sum_wd_pm = list(format = "f", digits = 0),
sum_we_am = list(format = "f", digits = 0),
sum_we_pm = list(format = "f", digits = 0)
)
) +
tm_borders(col = "grey40", lwd = 0.7)
map_honeycombI also use the mapview() to create an overview.
#| code-fold: true
#| code-summary: "Show the code"
# Create an interactive map using mapview
mv_map <- mapview(
summary_trips_with_hexagon_count,
zcol = c("sum_wd_am", "sum_wd_pm", "sum_we_am", "sum_we_pm"),
popup = popupTable(summary_trips_with_hexagon_count,
zcol = c("grid_id.x", "sum_wd_am", "sum_wd_pm", "sum_we_am", "sum_we_pm"))
)
mv_mapAnalysis of the Passenger Trips by Time Period and Hexagon
Looking at Time Period
Based on the data and analysis, it seems that the passenger trips is affected by the time of the day (also the reason why we are looking at peak periods).
I can verify this by plotting a bar plot of the number of trips by Time_Per_Hour.
#| code-fold: true
#| code-summary: "Show the code"
odbus_time <- odbus %>%
filter(DAY_TYPE == "WEEKDAY") %>%
group_by(ORIGIN_PT_CODE) %>%
group_by(TIME_PER_HOUR) %>%
summarise (n_trips = sum(TOTAL_TRIPS))
ggplot(odbus_time, aes(x = TIME_PER_HOUR, y = n_trips, fill = TIME_PER_HOUR)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Passenger Trips by Hour (Weekdays)",
x = "Time Period", y = "Trip Count") +
theme_minimal()
Very distinct twin period for weekdays.
Whereas the timings will be slightly different for weekends, and the peaks are less obvious.
#| code-fold: true
#| code-summary: "Show the code"
odbus_weekends <- odbus %>%
filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
group_by(ORIGIN_PT_CODE) %>%
group_by(TIME_PER_HOUR) %>%
summarise (n_trips = sum(TOTAL_TRIPS))
ggplot(odbus_weekends, aes(x = TIME_PER_HOUR, y = n_trips, fill = TIME_PER_HOUR)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Passenger Trips by Hour (Weekend/Holidays)",
x = "Time Period", y = "Trip Count") +
theme_minimal()
Look at the Locations
To help us understand the patterns a bit better, I decided to look at the Pasir Ris region (since grid 4820 has the most number of bus stops) to see if it would be true that the number of trips would be higher for grid for more bus stops.
So this is not true based on the map analysis.
The one with the higher number of trips was at Grid 4635, which is the one at the Pasir Ris Station.
Additionally, for such grids with a high number of passenger trips, they continue to see high passenger trips for weekday morning, weekday afternoon, and over the weekends as well.
So I would think that the grid with the most connectivity will continue to see more number of passenger trips, with the number of trips affected mainly by the time period of the day.
For example, the interchanges would see a high number of passenger trips as compared to its neighbouring bus stops. Also if the passengers are near an interchange, they may prefer to walk there instead of taking a bus.
Given these insights - I chose to perform the Emerging Hotspot Analysis instead.
Emerging Hotspot Analysis (EHSA)
Doing EHSA with the Four Time Periods
Loading the Packages
pacman::p_load(sf, sfdep, tmap, tidyverse, knitr, plotly)Data Wrangling
I would do some data wrangling to allow the building of the spacetime cube and to do the EHSA.
p_trip <- st_drop_geometry(summary_trips_with_grid)df_long <- p_trip %>%
gather(key = "Period", value = "n_trips", -grid_id)
# Convert Period values to integers (1, 2, 3, 4)
df_long <- df_long %>%
mutate(Period = case_when(
Period == "sum_wd_am" ~ 1,
Period == "sum_wd_pm" ~ 2,
Period == "sum_we_am" ~ 3,
Period == "sum_we_pm" ~ 4
))# Perform a semi-join to keep rows in df1 with matching grid_id in df2
result_df1 <- semi_join(honeycomb_grid_sf, df_long, by = "grid_id")
# View the resulting data frame
head(result_df1)Simple feature collection with 6 features and 2 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 3720.122 ymin: 27925.48 xmax: 4970.122 ymax: 31533.92
Projected CRS: SVY21 / Singapore TM
grid_id n_busstops area_honeycomb_grid
1 34 1 POLYGON ((3970.122 27925.48...
2 65 1 POLYGON ((4220.122 28358.49...
3 99 1 POLYGON ((4470.122 30523.55...
4 127 1 POLYGON ((4720.122 28358.49...
5 129 2 POLYGON ((4720.122 30090.54...
6 130 1 POLYGON ((4720.122 30956.57...
Creating the Space Time Cube
Using spacetime() , I will create the spatio-temporal cube to include time in our analysis.
ptrip_spt <- spacetime(df_long, result_df1,
.loc_col = "grid_id",
.time_col = "Period")
is_spacetime_cube(ptrip_spt)[1] TRUE
Computing the Gi*
To note that we need to use the distance weight matrix!! So we will use st_inverse_distance
ptrip_nb <- ptrip_spt %>%
activate("geometry") %>%
mutate(nb = include_self(st_contiguity(area_honeycomb_grid)),
wt = st_inverse_distance(nb, area_honeycomb_grid, scale = 1, alpha = 1),
.before = 1) %>%
set_nbs("nb") %>%
set_wts("wt")Grouping by the period, local_gstar_perm() will calculate the local Gi*.
gi_stars <- ptrip_nb %>%
group_by(Period) %>%
mutate(gi_star = local_gstar_perm(n_trips, nb, wt)) %>%
tidyr::unnest(gi_star)Mann-Kendall Test
Next, we will conduct the Mann-Kendall Test to determine the trend of the passenger trips over the time period and if they are significant. We will do this for Grid 4635, at the vicinity of Pasir Ris Interchange.
cbg <- gi_stars |>
ungroup() |>
filter(grid_id == 4635) |>
select(grid_id, Period, gi_star)ggplot(cbg, aes(Period, gi_star)) +
geom_line() +
theme_light()
cbg %>%
summarise(mk = list(
unclass(
Kendall::MannKendall(gi_star)))) %>%
tidyr::unnest_wider(mk)# A tibble: 1 x 5
tau sl S D varS
<dbl> <dbl> <dbl> <dbl> <dbl>
1 0.333 0.734 2 6.00 8.67
In the above result, sl is the p-value. This result tells us that the p-value is too high and there is no trend.
We can try to do this for all the grids using the following code chunk. But the analysis will be incomplete, because of the time period that we are examining.
Firstly, there is a difference in the number of passenger trips between weekdays and weekends.
Secondly, grouping it by the peak periods will result in an incomplete time picture.
So for us to do a meaningful EHSA, we will need the time series to be done by TIME_PER_HOUR based on weekdays and weekends/holiday separately.
ehsa <- gi_stars %>%
group_by(grid_id) %>%
summarise(mk = list(
unclass(
Kendall::MannKendall(gi_star)))) %>%
tidyr::unnest_wider(mk)emerging <- ehsa %>%
arrange(sl, abs(tau)) %>%
slice(1:10)EHSA Analysis
ehsa <- emerging_hotspot_analysis(
x = ptrip_spt,
.var = "n_trips",
k = 1,
nsim = 99
)count(ehsa, classification)# A tibble: 6 x 2
classification n
<chr> <int>
1 consecutive coldspot 97
2 new coldspot 221
3 no pattern detected 898
4 persistent coldspot 111
5 sporadic coldspot 47
6 sporadic hotspot 150
Show the code
ggplot(data = ehsa,
aes(x = classification)) +
geom_bar()
Show the code
ptrip_ehsa <- result_df1 %>%
left_join(ehsa,
by = join_by(grid_id == location))
ehsa_sig <- ptrip_ehsa %>%
filter(p_value < 0.1)
tmap_mode("view")
tm_shape(ptrip_ehsa) +
#tm_polygons() +
tm_borders(alpha = 0.5) +
tm_shape(ehsa_sig) +
tm_fill("classification") +
tm_borders(alpha = 0.4)No proper trend identified!
EHSA based on Weekdays
As highlighted earlier, we would need to perform the EHSA based on the TIME_PER_HOUR for weekdays and weekends/holiday separately. Here I will try to grapple with Weekdays first.
Data Wrangling
I will need to do a bit of data wrangling so that I can create a proper space time cube for the subsequent analysis.
Firstly - to filter and group the data based on ORIGIN_PT_CODE and TIME_PER_HOUR.
odbus_time2 <- odbus %>%
filter(DAY_TYPE == "WEEKDAY") %>%
group_by(ORIGIN_PT_CODE, TIME_PER_HOUR) %>%
summarise(n_trips = sum(TOTAL_TRIPS))Joining the data with busstops.
odbus_time_geospatial <- busstops %>%
left_join(odbus_time2, by = c("BUS_STOP_N" = "ORIGIN_PT_CODE"))
odbus_trips_with_grid <- st_join(odbus_time_geospatial, honeycomb_grid_sf)odbus_grid <- odbus_trips_with_grid %>%
group_by(grid_id, TIME_PER_HOUR) %>%
summarise (total_trips = sum(n_trips))
odbus_grid <- st_drop_geometry(odbus_grid)
odbus_grid$TIME_PER_HOUR <- as.integer(odbus_grid$TIME_PER_HOUR)
odbus_grid_filtered = odbus_grid %>%
filter(!is.na(total_trips) & total_trips > 0)# Create a reference data frame with all combinations of grid_id and TIME_PER_HOUR
reference_df <- expand.grid(
grid_id = unique(odbus_grid$grid_id),
TIME_PER_HOUR = 6:20
)
# Merge the reference data frame with the original data
df_merged <- merge(reference_df, odbus_grid, by = c("grid_id", "TIME_PER_HOUR"), all.x = TRUE)# Group by grid_id and check if there is any missing hour
missing_hours <- df_merged %>%
group_by(grid_id) %>%
filter(any(is.na(total_trips)) | any(total_trips < 0) )
# Get unique grid_id values with missing hours
grid_ids_with_missing_hours <- unique(missing_hours$grid_id)
df_filtered <- df_merged %>%
filter(!(grid_id %in% grid_ids_with_missing_hours))result_df2 <- semi_join(honeycomb_grid_sf, df_filtered, by = "grid_id")
# View the resulting data frame
head(result_df2)Simple feature collection with 6 features and 2 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 4220.122 ymin: 28358.49 xmax: 5220.122 ymax: 32399.94
Projected CRS: SVY21 / Singapore TM
grid_id n_busstops area_honeycomb_grid
1 99 1 POLYGON ((4470.122 30523.55...
2 127 1 POLYGON ((4720.122 28358.49...
3 129 2 POLYGON ((4720.122 30090.54...
4 130 1 POLYGON ((4720.122 30956.57...
5 131 1 POLYGON ((4720.122 31822.59...
6 159 1 POLYGON ((4970.122 28791.5,...
Creating the Space Time Cube
ptrip_weekdays_spt <- spacetime(df_filtered, result_df2,
.loc_col = "grid_id",
.time_col = "TIME_PER_HOUR")
is_spacetime_cube(ptrip_weekdays_spt)[1] TRUE
EHSA Analysis
I want to observe whether is there a temporal trend associated with the number of passenger trips or are the numbers purely based on the spatial context. This is done using emerging_hotspot_analysis.
ehsa <- emerging_hotspot_analysis(
x = ptrip_weekdays_spt,
.var = "total_trips",
k = 1,
nsim = 99
)Distribution of EHSA Classes
ggplot(data = ehsa, aes(x = classification, fill = classification)) +
geom_bar(stat = "count") +
labs(title = "EHSA Classes",x = "Classification", y = "Count") +
scale_fill_brewer(palette = "Spectral") +
theme_minimal() +
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank())
EHSA on the Map
I join the data to that we can see the trends on the map.
ptrips_weekdays_ehsa <- honeycomb_grid_sf %>%
left_join(ehsa,
by = join_by(grid_id == location))
ptrips_weekdays_ehsa_filtered <- ptrips_weekdays_ehsa %>%
filter(!is.na(classification))ehsa_sig <- ptrips_weekdays_ehsa_filtered %>%
filter(p_value < 0.05)
tmap_mode("view")
tm_shape(ptrips_weekdays_ehsa_filtered) +
tm_polygons() +
tm_borders(alpha = 0.5) +
tm_shape(ehsa_sig) +
tm_fill("classification", palette = "Spectral") +
tm_borders(alpha = 0.4)Analysis of EHSA (Weekdays)
From the map - I note that there are persistent coldspots in the west near Tuas areas, because these are mainly industrial areas. There are also some at Sungei Kadut industrial area, along Mandai Road, and along Tampines Road (near Paya Lebar Airbase) - which are very “ulu” areas in Singapore.
Unsurprisingly, there is no Persistent Hotspot detected -> indicating that the number of passenger trips does changes over time. And we see more sporadic hotspots or coldspots over the day, at the regional towns such as:
Near Woodlands Checkpoint, Woodlands MRT / Interchange, Yishun
Punggol, Sengkang
Tampines, Bedok
Serangoon, Toa Payoh
Bukit Batok, Bukit Panjang, Jurong East
CBD area
Computing Gi*
- Where i am facing errors
ptrip_weekdays_nb <- ptrip_weekdays_spt %>%
activate("geometry") %>%
mutate(nb = include_self(st_contiguity(area_honeycomb_grid)),
wt = st_inverse_distance(nb, area_honeycomb_grid, scale = 1, alpha = 1),
.before = 1 )%>%
set_nbs("nb") %>%
set_wts("wt")#gi_stars <- ptrip_weekdays_nb %>%
# group_by(TIME_PER_HOUR) %>%
# mutate(gi_star = local_gstar_perm(total_trips, nb, wt)) %>%
# tidyr::unnest(gi_star)EHSA Based on Weekends
Similar Data Wrangling
I will need to do a bit of data wrangling so that I can create a proper space time cube for the subsequent analysis.
Firstly - to filter and group the data based on ORIGIN_PT_CODE and TIME_PER_HOUR.
odbus_time3 <- odbus %>%
filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
group_by(ORIGIN_PT_CODE, TIME_PER_HOUR) %>%
summarise(n_trips = sum(TOTAL_TRIPS))Joining the data with busstops.
odbus_weekends_geospatial <- busstops %>%
left_join(odbus_time3, by = c("BUS_STOP_N" = "ORIGIN_PT_CODE"))
odbus_weekend_grid <- st_join(odbus_weekends_geospatial, honeycomb_grid_sf)odbus_weekend <- odbus_weekend_grid %>%
group_by(grid_id, TIME_PER_HOUR) %>%
summarise (total_trips = sum(n_trips))
odbus_weekend <- st_drop_geometry(odbus_weekend)
odbus_weekend$TIME_PER_HOUR <- as.integer(odbus_weekend$TIME_PER_HOUR)
odbus_weekend_filtered = odbus_weekend %>%
filter(!is.na(total_trips) & total_trips > 0)# Create a reference data frame with all combinations of grid_id and TIME_PER_HOUR
weekend_df <- expand.grid(
grid_id = unique(odbus_weekend$grid_id),
TIME_PER_HOUR = 6:20
)
# Merge the reference data frame with the original
df_weekend <- merge(weekend_df, odbus_weekend, by = c("grid_id", "TIME_PER_HOUR"), all.x = TRUE)missing_hours_weekend <- df_weekend %>%
group_by(grid_id) %>%
filter(any(is.na(total_trips)) | any(total_trips < 0) )
# Get unique grid_id values with missing hours
grid_ids_weekend_with_missing_hours <- unique(missing_hours_weekend$grid_id)
df_weekend_filtered <- df_weekend %>%
filter(!(grid_id %in% grid_ids_weekend_with_missing_hours))result_df3 <- semi_join(honeycomb_grid_sf, df_weekend_filtered, by = "grid_id")
# View the resulting data frame head(result_df2)Creating the Space Time Cube
ptrip_weekends_spt <- spacetime(df_weekend_filtered, result_df3,
.loc_col = "grid_id",
.time_col = "TIME_PER_HOUR")
is_spacetime_cube(ptrip_weekdays_spt)[1] TRUE
Computing the Gi*
ptrip_weekends_nb <- ptrip_weekends_spt %>%
activate("geometry") %>%
mutate(nb = include_self(st_contiguity(area_honeycomb_grid)),
wt = st_inverse_distance(nb, area_honeycomb_grid, scale = 1, alpha = 1),
.before = 1 )%>%
set_nbs("nb") %>%
set_wts("wt")#gi_stars <- ptrip_weekends_nb %>%
# group_by(TIME_PER_HOUR) %>%
# mutate(gi_star = local_gstar_perm(total_trips, nb, wt)) %>%
# tidyr::unnest(gi_star)EHSA Analysis
ehsa_weekends <- emerging_hotspot_analysis(
x = ptrip_weekends_spt,
.var = "total_trips",
k = 1,
nsim = 99
)Distribution of EHSA Classes
Show the code
ggplot(data = ehsa_weekends, aes(x = classification, fill = classification)) +
geom_bar(stat = "count") +
labs(title = "EHSA Classes",x = "Classification", y = "Count") +
scale_fill_brewer(palette = "Spectral") +
theme_minimal() +
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank())
EHSA on the Map (Weekends/Holidays)
Show the code
ptrips_weekends_ehsa <- honeycomb_grid_sf %>%
left_join(ehsa_weekends,
by = join_by(grid_id == location))
ptrips_weekends_ehsa_filtered <- ptrips_weekends_ehsa %>%
filter(!is.na(classification))Show the code
ehsa_weekend_sig <- ptrips_weekends_ehsa_filtered %>%
filter(p_value < 0.05)
tmap_mode("view")
tm_shape(ptrips_weekends_ehsa_filtered) +
tm_polygons() +
tm_borders(alpha = 0.5) +
tm_shape(ehsa_sig) +
tm_fill("classification", palette = "Spectral") +
tm_borders(alpha = 0.4)Analysis of EHSA (Weekends)
Comparing with the weekdays, the trending in terms of the classification is somewhat similar, but with a dip in terms of the sporadic hotspots.
In terms of the regions - it would seem to be similar to that of weekdays at the respective town regions. Understandably so, given that the human flow from these region would be the same.
Conclusions
In this exercise, I have looked at the passenger trips data and explored the geospatial association.
I have noted that urban mobility in Singapore is some what like a hub and spoke model, with the hub at various town regions, which are better connected to other transport options, be it other buses like in the interchanges or with MRT.
To fully understand the urban mobility in Singapore, I think we would also need to have data on the MRT and the number of passengers at these stations to have a more complete picture.
Another area where we can further explore is to compare data from two different period, i.e. between school holidays (in Jun or Dec) vs non-school holidays period, to discern if there is a difference in transport patterns between due to work or due to school.
We could also try to visualise the flow patterns of people in this scenario - which I think will be covered as part of Lesson 3 and Hands-On Ex 3.